Toward a Formal Framework for Continual Learning

نویسنده

  • Mark B. Ring
چکیده

This paper revisits the continual-learning paradigm I described at the previous workshop in 1995. It presents a framework that formally merges ideas from reinforcement learning and inductive transfer, potentially broadening the scope of each. Most research in RL assumes a stationary (non-changing) world, while research in transfer primarily focuses on supervised learning. Combining the two approaches yields a learning method for an agent that constantly improves its ability to achieve reward in complex, non-stationary environments. To design a learning algorithm is to make an assumption. The assumption is that there is structure in the learning task. If there is no structure, then there is no relationship between training data and testing data, there is nothing to be learned, and all learning fails. So we assume that there are regularities common to the training and testing data and we develop algorithms to find and exploit these regularities. In general, we assume there is a (usually stochastic) function, f : X → Y that generated both the training and testing output data, which is to say that the task of the learning agent is to discover this function (or a function with sufficiently similar behavior). 1 Inductive Transfer Inductive transfer works when the functions learned in different tasks can be decomposed into functions over subfunctions. For example, a task a may be to learn some function f : X → Y , based on (x, y) pairs sampled from a joint distribution over X × Y . And it may be that f can be decomposed, say, into an ordered set of k functions F a = {f 1 , ..., f k : X → <} and a combining function f C : < → Y such that f(x) = f C(f a 1 (x), f a 2 (x), ..., f a k (x)), or, more succinctly: f(x) = f C(F (x)). The functions f i ∈ F a form a minimal basis for the mapping f : X → Y , when (1) f(x) is sufficiently well approximated, and (2) no f i is unnecessary. Condition (1) can be achieved in the standard way through the introduction of an appropriate loss function. Condition (2) is achieved when there is no proper subset G ⊂ F a nor combining function g C : <|G| → Y such that for all x ∈ X , f(x) = g C(G(x)). Given two tasks, a and b, generated respectively by functions, f and f b : X → Y , inductive transfer is possible when there is a minimal basis for each with members in common; i.e., for some sets of basis functions F a and F , there are combining functions f C : <|F | → < and f b C : <|F | → < such that f(x) = f C(F (x)) f (x) = f b C(F (x)) F a ∩ F b 6= ∅ Such functions that can share subfunctions are referred to here as transfer compatible functions. Conditions (1) and (2) are fairly weak if the combining functions f C can be of arbitrary complexity, though the conditions are fairly strong if the combining functions are, for example, linear. Intuitively it seems likely the functional and structural similarity [3] of functions f and f b (and hence the potential benefit of transfer from one task to the other) increases as |F a ∩F |/|F a ∪F | increases and the VC dimension of the combining functions decreases. Inductive transfer is therefore the search for common bases, as was done, for example, quite explicitly by Baxter [1], and implicitly in the case of multi-task learning (as for example by Caruana [2]). 2 Reinforcement Learning In the standard reinforcement-learning framework (cf. Sutton and Barto, 1998) , a learning agent interacts with a Markov Decision Process (MDP) over a series of time steps t ∈ {0, 1, 2, ...}. At each time step the agent takes an action at ∈ A in its current state st ∈ S and receives a reward rt ∈ <. The dynamics underlying the environment are described as an MDP with state-to-state transition probabilities P ss′ = Pr{st+1 = s|st = s, at = a} and expected rewards Rs = E{rt+1|st = s, at = a}. The agent’s decision-making process is described by a policy, π(s, a) = Pr{at = a|st = s} which the agent refines through repeated interaction with the environment so as to increase Eπ,s0 [r0, r1, ..., r∞], the reward it can expect to receive if it follows policy π from state s0. Alternatively, the agent may sample observations o ∈ O related to the current state (possibly stochastically), where st may or may not be uniquely identified by the current observation ot, perhaps when taken in combination with previous observations and actions (o0, a0, o1, a1, ..., ot−1, at−1). 3 Continual Learning In traditional reinforcement learning, the world is modeled as a stationary MDP: fixed dynamics and states that can recur infinitely often. The agent’s learning “task” is to improve performance by improving its policy, which generally entails developing an estimate of the expected cumulative reward attainable from individual states (the state-value function) or from state-action pairs (the action-value function). Augmenting RL with concepts from inductive transfer quickly hits a snag: small changes to the structure of the state space (especially small changes to the placement of rewards) can introduce major changes in the value function. One solution is to model the relationships between the states separately from the value function and then when the rewards change, recalculate the value function from the model using dynamic-programming methods. Predictable changes in the relationships between the states, however, are more difficult to capture. The alternative explored here is to step away from the MDP foundation of RL and instead describe the environment in terms of regular relationships between history and future. 3.1 The Continual-Learning Problem, Formally A continual-learning agent’s inputs are observations, ot ∈ O. The agent may learn from all its past experiences, collectively known as its history. Each moment of history is a triple, m = (o, r, a) ∈ M where M = O × < × A and a ∈ A is an agent action.1 In the discrete-time case, a history is a discrete series of moments, h(t) = (m0, ...,mt), while in the continuous-time case, the history is a continuous function of time h : [0, t]→M. The future is slightly different from the past in that it is not yet known. Instead, there are probability distributions over possible futures contingent on the policy. Each possible future is an infinite-length trajectory of moments. In the continuous case it can be represented as a function mapping time to possible future moments ξ : (t,∞) → M. The set of all possible futures is Z , and for each policy and history there is a probability distribution D over Z; i.e., D(ξ|h ∈ H, π ∈ Π), where H is the space of all possible histories, and Π is the space of all possible policies. The agent’s aim is to maximize its expected return by estimating the reward part of this distribution and finding the policy with greatest expected reward. If R(m) is the reward part of a moment, then the expected return for a particular future is

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Electronic Non Formal Education: A Case Study of Tehran Municipality

Considering the emergence of brand-new educational needs, non-formal education as a prerequisite of a knowledge-based society is increasingly going popular among nations in view of its continual nature. However, life-long learning as a key concept of the modern education system stands clearly at odds with traditional learning in every way of philosophy, objectives,...

متن کامل

The Relationship between Information Literacy and Access to Facilities with Attitudes toward E-learning among students of Urmia University of Medical Sciences

Introduction: E-learning is considered as one of the most important elements of higher education in the information era. The present study aimed to investigate the relationship between information literacy and access to facilities with attitudes toward e-learning among students of Urmia University of Medical Sciences. Methods: This descriptive study was performed on 190 senior students of Urmi...

متن کامل

Variational Continual Learning

This paper develops variational continual learning (VCL), a simple but general framework for continual learning that fuses online variational inference (VI) and recent advances in Monte Carlo VI for neural networks. The framework can successfully train both deep discriminative models and deep generative models in complex continual learning settings where existing tasks evolve over time and enti...

متن کامل

Strategic Cost-Cutting in Information Technology: toward a Framework for Enhancing the Business Value of IT

The increasing dependency of many businesses with information technology (IT)and the high percentage of the IT investment in all invested capital in businessenvironment ask for more attention to this important driver of business. Thelimitation of capital budget forces the managers to look for more wise investment inIT. There are many cost-cutting techniques in the literature and each of them ha...

متن کامل

Role of New Technologies in Medical Continual Training

Continual medical training has been regarded as necessary case for preserving and promoting skills of graduates of medicine on which basis continual curricula of medical society are being executed in order to promote job knowledge and skills and improve provision of health-therapeutic services in the country. Now, after some years of continual curricula commencement, this question about effe...

متن کامل

A Formal Approach to Learning From Examples

This paper presents a formal, foundational approach to learning from examples in machine learning. It is assumed that a learning system is presented with a stream of facts describing a domain of application. The task of the system is to form and modify hypotheses characterising the relations in the domain, based on this information. Presumably the set of hypotheses that may be so formed wil l r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005